Explainable Machine Learning


Advanced Programming for Data Science

     

Lecturer: Dr. HAS Sothea

Content

  • Motivation & Introducion
    • Interpretable ML vs Explainable ML
  • Interpretable Models:
    • Linear Models
    • Decision Trees
  • Explainable Boosting Machine (EBM)
    • Bootstrap Aggregating: Bagging
    • Random Forest/Extremely Randomized Trees
    • Boosting: Adaboost & XGBoost
  • Model-Agnostic Explanation Methods
    • LIME: Local Interpretable Model-Agnostic Explanations
    • SHAP: SHapley Additive exPlanations
    • Feature Importance: Permutation Importance…

Motivation & Introducion

Open the “black box” of ML models

The “Black Box” Problem

  • Modern AI/ML models (like Deep Learning) are powerful but complex.
  • They often act as “Black Boxes”: we see the input and the output, but not the internal decision-making process.
  • In practice, this lack of transparency can lead to:
    • Mistrust from users and stakeholders.
    • Difficulty in debugging and improving models.
    • Challenges in regulatory compliance, especially in sensitive areas like healthcare and finance.
  • Therefore, explainability & interpretability may be more important than accuracy in some contexts.

Explainable Machine Learning (XML)

  • Explainable Machine Learning (XML) is a subfield of ML that focuses on developing models and techniques that provide insights into how and why certain predictions or decisions are made by the model.
  • The goal of XML is to make machine learning models more transparent, interpretable, and understandable to humans

Goals of Explainable ML

  • Transparency: Making the inner workings of ML models visible and understandable.
  • Trust: Building confidence in ML models by providing clear explanations for their predictions.
  • Accountability: Enabling stakeholders to hold models accountable for their decisions.
  • Debugging: Helping developers identify and fix issues in ML models.
  • Regulatory Compliance: Ensuring that ML models meet legal and ethical standards for explainability.

Interpretable ML vs Explainable ML

Interpretable ML

  • Interpretable ML focuses on building models that are inherently understandable by humans.
  • These models are designed to be simple and transparent, allowing users to easily grasp how predictions are made.
  • Examples of interpretable models include:
    • Linear Regression
    • Decision Trees
    • Rule-Based Models
  • The main advantage of interpretable models is that they provide clear insights into the decision-making process.

Interpretable ML vs Explainable ML

Explainable ML

  • Explainable ML focuses on providing explanations for complex, “black box” models that are not inherently interpretable.
  • These models may achieve higher accuracy but lack transparency.
  • Explainable ML techniques aim to shed light on how these models make predictions, often through post-hoc analysis.
  • Examples of explainable ML techniques include:
    • LIME (Local Interpretable Model-Agnostic Explanations)
    • SHAP (SHapley Additive exPlanations)
    • Feature Importance
  • The main advantage of explainable ML is that it allows users to understand and trust complex models.

Summary

Interpretable ML vs Explainable ML

Aspect Interpretability Explainability
Focus How the model works internally (inner mechanics and logic). Why a specific decision or prediction was made (the rationale or justification).
Transparency Inherent transparency in model design (“white box” models like linear models or decision trees). Often achieved through post-hoc (after-the-fact) methods applied to complex “black box” models (like deep learning).
Goal Global understanding of the entire model’s operation. Local explanations for individual predictions, which can be aggregated for general insights.
Methods Use of simple, inherently interpretable models. Use of techniques like LIME, SHAP, and Feature Importance to explain complex models.
Target Audience Primarily data scientists and developers for debugging and improvement. End-users, stakeholders, and regulators to build trust and ensure compliance.

I. Interpretable Models

Model is Interpretable by Design

  • EDA is crucial to understand data before modeling:
    • Identify problems (take note) for correct data preprocessing…
    • Detect patterns and relationships (input-output) for feature selection and engineering…
    • Guide model choice…
  • Model choice: interpretable models like Linear Models and Decision Trees are preferred when interpretability is a priority.
  • Interpretability: based on model structure and parameters.

1. Linear Models

  • The connection between input \(X\) and output prediction \(y\) is explicitly linear in parameters.
  • Coefficients indicate the strength and direction of relationships.

Regression Models:

  • Linear Regression: \[\hat{y} = \color{blue}{\beta_0} + \color{blue}{\beta_1}\text{x}_1 + \color{blue}{\beta_2}\text{x}_2 + ... + \color{blue}{\beta_d}\text{x}_d.\]

  • Polynomial regression: \[\hat{y}=\color{blue}{\beta_0}+\sum_{j=1}^d\color{blue}{\beta_j}\text{x}_j+\sum_{k=1}^{\color{red}{p}}\sum_{\ell,m=1}^d\color{blue}{\gamma_{\ell,m,k}}\text{x}_\ell^k\text{x}_m^{{\color{red}{p}}-k}.\]

  • Regularized versions (Ridge, Lasso, Elastic Net) add penalties to prevent overfitting.

Classification Models:

  • Logistic Regression: \[\mathbb{P}(\hat{Y}=1|X=\text{x}) = \sigma(\color{blue}{\beta_0} + \sum_{j=1}^d\color{blue}{\beta_j}\text{x}_j).\]

  • Polynomial Logistic Regression: \[\mathbb{P}(\hat{Y}=1|X=\text{x})=\sigma(\color{blue}{\beta_0}+\sum_{j=1}^d\color{blue}{\beta_j}\text{x}_j+\sum_{k=1}^{\color{red}{p}}\sum_{\ell,m=1}^d\color{blue}{\gamma_{\ell,m,k}}\text{x}_\ell^k\text{x}_m^{{\color{red}{p}}-k}).\]

  • Regularized versions (Ridge, Lasso, Elastic Net) add penalties to prevent overfitting.

1. Linear Models

1.1. Linear Regression

mpg cylinders displacement horsepower weight acceleration model year origin car name
0 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
2 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
3 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst
4 17.0 8 302.0 140 3449 10.5 70 1 ford torino
  • Variables:
    • mpg: miles per gallon (target variable)
    • cylinders: number of cylinders
    • displacement: engine displacement
    • horsepower: engine horsepower
    • weight: vehicle weight
    • acceleration: time to accelerate from 0 to 60 mph
    • model year: year of manufacture
    • origin: origin of the car (1: USA, 2: Europe, 3: Asia).
  • Goal: Predict mpg based on other features.

1. Linear Models

1.1. Linear Regression

1.1.1. EDA

  • Data types:
mpg cylinders displacement horsepower weight acceleration model year origin
Type float64 int64 float64 object int64 float64 int64 int64
  • Q1: Is there anything wrong with column type?
  • A1: Two main problems:
    • origin is qualitative, therefore should be “category/object”.
    • ⚠️ horsepower is quantitative, therefore should be “float/int”.
  • Modifying data type:
mpg cylinders displacement horsepower weight acceleration model year origin
Type float64 int64 float64 int64 int64 float64 int64 category

1. Linear Models

1.1. Linear Regression

1.1.1. EDA: Univariate analysis

Code
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")
quan_vars = data.select_dtypes(include="number").columns
fig, axs = plt.subplots(2, 4, figsize=(10,4.5))
for i, va in enumerate(data.columns):
    if va in quan_vars:
        sns.histplot(data, x=va, kde=True, ax=axs[i//4, i%4], stat="proportion")
    else:
        if va != "car name":
            sns.countplot(data, x=va, ax=axs[i//4, i%4], stat="proportion")
            axs[i//4, i%4].bar_label(axs[i//4, i%4].containers[0], fmt="%.2f")
plt.tight_layout()
plt.show()

1. Linear Models

1.1. Linear Regression

1.1.1. EDA: Bivariate analysis

Code
import numpy as np
pair_grid = sns.PairGrid(data=data[quan_vars], height=0.7, aspect=2)

# Map plots to the lower triangle only
pair_grid.map_lower(sns.scatterplot)  # Scatterplots in the lower triangle
pair_grid.map_diag(sns.histplot)      # Histograms on the diagonal

def corr_func(x, y, **kws): 
    r1 = np.corrcoef(x, y)[0, 1]
    plt.gca().annotate(f"{r1:.2f}", xy=(0.5, 0.5), 
                       xycoords='axes fraction', 
                       ha='center', fontsize=20, color='#1d69d1')

pair_grid.map_upper(corr_func)
for ax in pair_grid.axes[:, 0]:  # Access the first column of axes (y-axis labels)
    ax.set_ylabel(ax.get_ylabel(), rotation=45, labelpad=20)
plt.tight_layout()
plt.show()

1. Linear Models

1.1. Linear Regression

1.1.1. EDA: Bivariate analysis

  • Does fuel-efficiency depend on the origin?
Code
_, axs = plt.subplots(1, 1, figsize=(7, 3.5))
sns.boxplot(data=data, x="origin", y="mpg", hue="origin", ax=axs)
plt.tight_layout()
plt.show()

1. Linear Models

1.1. Linear Regression

1.1.2. EDA: Summary

  • Weight shows the strongest negative correlation with mpg, followed by displacement, cylinders, and horsepower. These variables are significant in explaining variations in mpg.

  • These features are also highly correlated with each other, suggesting potential redundancy when included together in a predictive model.

  • Despite being a categorical variable, origin proves to be valuable for predicting mpg.

  • Many strongly linearly related inputs to the target mpg usefulness of Linear Models.

1. Linear Models

1.1. Linear Regression

1.1.2. Simple & Multiple LR

Simple Linear Regression

  • Model: \[\hat{y} = \color{blue}{\beta_0} + \color{blue}{\beta_1}\text{x}_1.\]
  • The best key \(\color{blue}{\vec{\widehat{\beta}}}=[\color{blue}{\beta_0},\color{blue}{\beta_1}]\) by: \[\color{blue}{\vec{\widehat{\beta}}}=\arg\min_{\color{blue}{\vec{\beta}}}\frac{1}{2n}\sum_{i=1}^n(y_i-\color{blue}{\hat{y}_i})^2\]

Multiple Linear Regression

  • Model: \[\hat{y} = \color{blue}{\beta_0} + \color{blue}{\beta_1}\text{x}_1 + \color{blue}{\beta_2}\text{x}_2 + ... + \color{blue}{\beta_d}\text{x}_d.\]
  • The best key \(\color{blue}{\vec{\widehat{\beta}}}=[\color{blue}{\beta_0},\dots,\color{blue}{\beta_d}]\) by: \[\color{blue}{\vec{\widehat{\beta}}}=\arg\min_{\color{blue}{\vec{\beta}}}\frac{1}{2n}\sum_{i=1}^n(y_i-\color{blue}{\hat{y}_i})^2\]
  • Analytic Solution: \(\color{blue}{\vec{\widehat{\beta}}}=\color{blue}{(X^TX)^{-1}X^Ty}.\)
  • Prediction: \(\hat{y}=\color{blue}{X\vec{\widehat{\beta}}}=\color{blue}{P}y\), where \(\color{blue}{P}:\) projection matrix onto \(\text{span}(X)\).

1. Linear Models

1.1. Linear Regression

1.1.2. Simple & Multiple LR

mpg weight cylinders model year
0 18.0 3504 8 70
1 15.0 3693 8 70
2 18.0 3436 8 70
  • Multiple LR: mpg vs Cyl + Year.

1. Linear Models

1.1. Linear Regression

1.1.2. Simple & Multiple LR

  • R-squared: \(R^2=1-\frac{\text{RSS}}{\text{TSS}}=1-\frac{\sum_{i=1}(y_i-\hat{y}_i)^2}{\sum_{i=1}(y_i-\overline{y}_n)^2}=\frac{\color{red}{\text{V}(\hat{Y})}}{\color{blue}{\text{V}(Y)}}.\)

  • Adjusted R-squared: \(R^2_{\text{adj}}=1-\frac{n-1}{n-d-1}(1-R^2).\)

Code
from sklearn.metrics import r2_score, mean_squared_error
def adj_r2(y_true, y_pred, d):
    n = len(y_true)
    return 1-(n-1)/(n-d-1)*(1-r2_score(y_true, y_pred))
cat = pd.get_dummies(data['origin'].astype(object), drop_first=True)*1
cat.columns = ['orig2', 'orig3']
X_full, y_full = pd.concat(
    [cat, 
     data.drop(columns=['mpg', 'car name', 'origin'])],
     axis=1), df['mpg']
lm_full = LinearRegression()
lm_full.fit(X_full, y_full)
pred_full = lm_full.predict(X_full)
X2 = data[['model year', 'cylinders']]
X1 = data[['weight']]

lm2 = LinearRegression()
lm1 = LinearRegression()
lm2.fit(X2, y_full)
lm1.fit(X1, y_full)

df_r2 = pd.DataFrame({
    'R2' : [r2_score(y_full, lm1.predict(X1)),
            r2_score(y_full, lm2.predict(X2)),
            r2_score(y_full, pred_full)],
    'Adj-R2' : [adj_r2(y_full, lm1.predict(X1), 1),
                adj_r2(y_full, lm2.predict(X2), 2),
                adj_r2(y_full, pred_full, X_full.shape[1])]
}, index=['LR1', 'LR2', 'LR-full'])
df_r2
R2 Adj-R2
LR1 0.692630 0.691842
LR2 0.715070 0.713606
LR-full 0.824199 0.820527
  • Can we do better?

1. Linear Models

1.1. Linear Regression

1.1.2. Simple & Multiple LR: \(t\)-test of coefficients

  • We can test \(H_0: \beta_j=0\) against \(H_1:\beta_j\neq 0\) using \(t\)-test.
  • If one of the two assumptions is true:
    • There are large enough observations \(n>30\)
    • Or the residuals follow Gaussian distribution with constant variance, then \(H_0\) is true, \[t_j=\frac{\beta_j}{s_{j}}\sim {\cal T}(n-d-1).\]
  • For a given level \(\alpha\), we CAN REJECT \(H_0:\beta_j=0\) if \(|t_j|>t_{\alpha/2}\) with \(\mathbb{P}(|{\cal T}(n-d-1)|\leq t_{\alpha/2})=1-\alpha\).

1. Linear Models

1.1. Linear Regression

1.1.2. Simple & Multiple LR: \(t\)-test of coefficients

import statsmodels.api as sm
model = sm.OLS(df['mpg'], sm.add_constant(df[['cylinders', 'year']]))
results = model.fit()
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    mpg   R-squared:                       0.715
Model:                            OLS   Adj. R-squared:                  0.714
Method:                 Least Squares   F-statistic:                     488.1
Date:                Fri, 05 Dec 2025   Prob (F-statistic):          8.84e-107
Time:                        10:53:25   Log-Likelihood:                -1115.1
No. Observations:                 392   AIC:                             2236.
Df Residuals:                     389   BIC:                             2248.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -17.1464      4.944     -3.468      0.001     -26.866      -7.426
cylinders     -2.9981      0.132    -22.718      0.000      -3.258      -2.739
year           0.7502      0.061     12.276      0.000       0.630       0.870
==============================================================================
Omnibus:                       24.502   Durbin-Watson:                   1.290
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               31.620
Skew:                           0.513   Prob(JB):                     1.36e-07
Kurtosis:                       3.940   Cond. No.                     1.79e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.79e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

1. Linear Models

1.1. Linear Regression

1.1.3. Polynomial Features

  • Predicting target using linear form of inputs may be unrealistic!
  • More complicated forms of inputs might be better for predicting the target!
  • Ex: mpg vs weight: \[\widehat{\text{mpg}}=\color{blue}{\beta_0}+\sum_{j=1}^p\color{blue}{\beta_j}\text{weight}^j.\]

1. Linear Models

1.1. Linear Regression

1.1.3. Polynomial Features

1. Linear Models

1.1. Linear Regression

1.1.3. Polynomial Features

Code
X_full2 = pd.concat([X_full, data['weight'] ** 2], axis=1)
lm_full2 = LinearRegression()
lm_full2.fit(X_full2, y_full)

df_r2 = pd.concat([
    df_r2,
    pd.DataFrame({
    'R2' : [r2_score(y_full, lm_full2.predict(X_full2))],
    'Adj-R2' : [adj_r2(y_full, lm_full2.predict(X_full2), X_full2.shape[1])]}, 
    index=['LR-full-Poly2'])])
df_r2
R2 Adj-R2
LR1 0.692630 0.691842
LR2 0.715070 0.713606
LR-full 0.824199 0.820527
LR-full-Poly2 0.858133 0.854791
  • Can you make it even better?

1. Linear Models

1.1. Linear Regression

1.1.4. Regularization

  • High-degree polynomials often leads to overfitting (too flexible model).
  • Regularization is a common method used to controll this flexibility by controlling the magnitude of the coefficients.

1. Linear Models

1.1. Linear Regression

1.1.4. Regularization: Ridge Regression

  • Model: \(\hat{y}=\color{blue}{\beta_0}+\color{blue}{\beta_1}x_1+\dots+\color{blue}{\beta_d}x_d\),

  • Objective: Search for \(\color{blue}{\vec{\beta}=[\beta_0,\dots,\beta_d]}\) minimizing the following loss function for some \(\color{green}{\alpha}>0\): \[{\cal L}_{\text{ridge}}(\vec{\beta})=\color{red}{\underbrace{\frac{1}{n}\sum_{i=1}^n(y_i-\widehat{y}_i)^2}_{\text{MSE}}}+\color{green}{\alpha}\color{blue}{\underbrace{\sum_{j=0}^{d}\beta_j^2}_{\text{Magnitude}}}.\]

  • Recall: SLR & MLR seek to minimize only MSE.

1. Linear Models

1.1. Linear Regression

1.1.4. Regularization: Ridge Regression

  • Large \(\color{green}{\alpha}\Rightarrow\) strong penalty \(\Rightarrow\) small \(\vec{\beta}\).
  • Small \(\color{green}{\alpha}\Rightarrow\) weak penalty \(\Rightarrow\) freer \(\vec{\beta}\).
  • 🔑 Objective: Learn the best \(\color{green}{\alpha}>0\).
  • Loss: \({\cal L}_{\text{ridge}}(\vec{\beta})=\color{red}{\underbrace{\frac{1}{n}\sum_{i=1}^n(y_i-\widehat{y}_i)^2}_{\text{MSE}}}+\color{green}{\alpha}\color{blue}{\underbrace{\sum_{j=0}^{d}\beta_j^2}_{\text{Magnitude}}}.\)

1. Linear Models

1.1. Linear Regression

1.1.4. Regularization: Ridge Regression

  • Consider mpg vs polynomials of horsepower.

1. Linear Models

1.1. Linear Regression

1.1.4. Regularization: Ridge Regression

How to find a suitable regularization strength \(\color{green}{\alpha}\)?

Tuning Regularization Stregnth \(\color{green}{\alpha}\) Using \(K\)-fold Cross-Validation

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
# Data
X, y = data[["horsepower"]], data['mpg']
poly = PolynomialFeatures(degree=10)
X_poly = poly.fit_transform(X)
# List of all degrees to search over
alphas = list(np.linspace(0.01, 100000, 100))
# List to store all losses
loss = []
coefficients = {f'alpha={alpha}': [] for alpha in alphas}
for alp in alphas:
    model = Ridge(alpha=alp)
    score = -cross_val_score(model, X_poly, y, cv=10, 
                scoring='neg_mean_absolute_error').mean()
    loss.append(score)
    # Fit
    model.fit(X_poly, y)
    coefficients[f'alpha={alp}'] = model.coef_

1. Linear Models

1.1. Linear Regression

1.1.4. Regularization: Ridge Regression

How to find a suitable regularization strength \(\color{green}{\alpha}\)?

Tuning Regularization Stregnth \(\color{green}{\alpha}\) Using \(K\)-fold Cross-Validation

1. Linear Models

1.1. Linear Regression

1.1.4. Regularization: Ridge Regression

Pros

  • It works well when there are inputs that are approximately linearly related with the target.
  • It helps stabilize the estimates when inputs are highly correlated.
  • It can prevent overfitting and should be used along with polynomial features.
  • It is effective when the number of inputs exceeds the number of observations.
  • Highly interpretable: coefficients are the direct influence of each term onto the target.

Cons

  • It does not work well when the input-output relationships are highly non-linear.
  • It may introduce bias into the coefficient estimates.
  • It does not perform feature selection.

1. Linear Models

1.1. Linear Regression

1.1.5. Regularization: Lasso Regression

  • Model: \(\hat{y}=\color{blue}{\beta_0}+\color{blue}{\beta_1}x_1+\dots+\color{blue}{\beta_d}x_d\),
  • Objective: Search for \(\vec{\beta}=[\beta_0,\dots,\beta_d]\) minimizing the following loss function for some \(\color{green}{\alpha}>0\): \[{\cal L}_{\text{lasso}}(\vec{\beta})=\color{red}{\underbrace{\frac{1}{n}\sum_{i=1}^n(y_i-\widehat{y}_i)^2}_{\text{MSE}}}+\color{green}{\alpha}\color{blue}{\underbrace{\sum_{j=0}^{d}|\beta_j|}_{\text{Magnitude}}}.\]

1. Linear Models

1.1. Linear Regression

1.1.5. Regularization: Lasso Regression

  • Large \(\color{green}{\alpha}\Rightarrow\) strong penalty \(\Rightarrow\) small \(\vec{\beta}\).
  • Small \(\color{green}{\alpha}\Rightarrow\) weak penalty \(\Rightarrow\) freer \(\vec{\beta}\).
  • 🔑 Objective: Learn the best \(\color{green}{\alpha}>0\).
  • Loss: \({\cal L}_{\text{lasso}}(\vec{\beta})=\color{red}{\underbrace{\frac{1}{n}\sum_{i=1}^n(y_i-\widehat{y}_i)^2}_{\text{MSE}}}+\color{green}{\alpha}\color{blue}{\underbrace{\sum_{j=0}^{d}|\beta_j|}_{\text{Magnitude}}}.\)

1. Linear Models

1.1. Linear Regression

1.1.5. Regularization: Lasso Regression

Tuning Regularization Stregnth \(\color{green}{\alpha}\) Using \(K\)-fold Cross-Validation

1. Linear Models

1.1. Linear Regression

1.1.5. Regularization: Lasso Regression

Pros

  • Lasso inherently performs feature selection when increasing regularization parameter \(\alpha\) (less important variables are forced to be completely \(0\)).
  • It works well when there are many inputs (high-dimensional data) and some highly correlated with the target.
  • It can handle collinearities (many redundant inputs).
  • It can prevent overfitting and offers high interpretability.

Cons

  • It does not work well when the input-output relationships are highly non-linear.
  • It may introduce bias into the coefficient estimates.
  • It is sensitive to the scale of the data, so proper scaling of predictors is crucial before applying the method.

1. Linear Models

1.2. Logistic Regression

1. Linear Models

1.3. Decision Trees

Interpretable Model Summary:

  • In any ML, understand the data via EDA is the main first step.
  • Insights from EDA, can guild us to suitable models and what to watch out.
  • In interpretable ML, models should be transparent and interpretable: linear & trees.
  • Refinement is done based on key problems detected in EDA step.
  • The interpretation is done directly using parameters and model structure.

II. Explainable Models

1. Local Model-agnostic methods

Why interpretable alone is not enough?

  • Linear models or decision trees maybe transparent but less accurate on some tasks.
  • This may require more complex models such as
    • Random forest
    • XGBoost
    • Deep neural networks…
  • Some post-hoc methods are used to explain the behavior of such models:
    • LIME: Local Interpretable Model-Agnostic Explanations
    • SHAP:: SHapley Additive exPlanations.
    • Feature Importance:: Global summary of feature impact.

1. Local Model-agnostic methods

1.1. LIME: Local Interpretable Model-agnostic Explanations

  • Goal: Explains individual predictions of “black box” machine learning models.
  • Model-Agnostic: Works on any model architecture (Neural Networks, XGBoost, SVMs) without needing access to internal weights.
  • Local Fidelity: Focuses on explaining the model’s behavior only around the specific instance being predicted, not the global logic.
  • Interpretable Output: Approximates complex decision boundaries using simple, human-readable models (Linear Regression or Decision Trees).

1. Local Model-agnostic methods

1.1. LIME: Local Interpretable Model-agnostic Explanations

  • Global vs. Local: A complex model’s boundary is non-linear and jagged globally, but often looks linear when “zoomed in” on a single point.
  • The Strategy: LIME fits a straight line (linear model) tangent to the complex curve at the specific point of interest.
  • The Trade-off: Optimizes for Fidelity (matching the black box prediction locally) and Simplicity (using few features).
  • Interpretation: Relies on interpretable models.

1. Local Model-agnostic methods

1.1. LIME: Local Interpretable Model-agnostic Explanations

  • The image was predicted, by Google’s Inception Model, to be ‘Electric guitar’ class with \(p=0.32\), followed by ‘Acoustic guitar’ with \(p=0.24\) and ‘Labrador’ with a chance \(p=0.21\).

1. Local Model-agnostic methods

1.1. LIME: Local Interpretable Model-agnostic Explanations

  • Given a complex model \(f\) and data to be explained (\(\color{blue}{x}\)).
  • The explanation \(\epsilon(\color{blue}{x})\) is an interpretable model \(\color{green}{g}\) s.t., \[\epsilon(\color{blue}{x})=\arg\min_{\color{green}{g}\in G}\mathcal{L}(f,\color{green}{g},\pi_{\color{blue}{x}})+\Omega(\color{green}{g})\text{ with }\] \[\text{Fedelity Loss }{\cal L}(f,\color{green}{g},\pi_{\color{blue}{x}})=\sum_{\color{purple}{z},\color{red}{z'}}\pi_{\color{blue}{x}}(\color{purple}{z})(f(\color{purple}{z})-\color{green}{g}(\color{red}{z'}))^2.\]
    • \(\color{red}{z'}\): A perturbed sample (a binary vector).
    • \(\color{purple}{z}\): The converted sample from \(\color{red}{z'}\) for model \(f\).
    • \(f(\color{purple}{z})\): The prediction of the black box model.
    • \(\color{green}{g}(\color{red}{z'})\): The prediction of the linear explanation.
    • \(\pi_{\color{blue}{x}}(\color{purple}{z})\): The weight, commonly exponential kernel: \[\pi_{\color{blue}{x}}(\color{purple}{z})=\exp\left(-\|\color{blue}{x}-\color{purple}{z}\|^2/\sigma^2\right).\]

1. Local Model-agnostic methods

1.1. LIME: The Surrogate Model (\(\color{green}{g}\))

  • The surrogate model \(\color{green}{g}\) can represnts \(f\) locally around the instance \(\color{blue}{x}\).
  • The fidelity model: is interpretable, \[\color{green}{g}(x) = \color{green}{w_0} + \sum \color{green}{w_i}\color{blue}{x_i}.\]
  • Magnitude (\(|\color{green}{w}|\)): The strength of influence.
  • Positive (+): Feature supports the prediction.
  • Negative (-): Feature opposes the prediction.
  • Context: Explains specific instances, not global logic.

Local Linear Approximation

LIME Feature Importance

1. Local Model-agnostic methods

1.1. LIME: The Surrogate Model (\(\color{green}{g}\))

Code
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from lime import lime_image
from skimage.segmentation import mark_boundaries
import pandas as pd

# 1. Load and Preprocess Data (Fashion MNIST)
# ---------------------------------------------------------
path = '/Users/hassothea/.cache/kagglehub/datasets/zalando-research/fashionmnist/versions/4'

data_test = pd.read_csv(path + '/fashion-mnist_test.csv')
data_train = pd.read_csv(path + '/fashion-mnist_train.csv')
x_train, y_train = data_train.iloc[:,1:], data_train['label']
x_test, y_test = data_test.iloc[:,1:], data_test['label']

# Normalize and reshape to (28, 28, 1) for the model
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

# Class names for Fashion MNIST
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# 2. Build or Load Your Model
# ---------------------------------------------------------
# (Here we build a quick dummy model for demonstration)
import os

# Define a filename for your model
model_path = './models/fashion_mnist_lime_model.keras'

# Check if the model already exists
if os.path.exists(model_path):
    print(f"Found saved model at '{model_path}'. Loading...")
    model = keras.models.load_model(model_path)
else:
    print("No saved model found. Training now...")
    # Build the model
    model = keras.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    
    model.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy', 
                  metrics=['accuracy'])
    
    # Train the model
    model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.1)
    
    # Save the model for next time
    model.save(model_path)
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=1, batch_size=64, validation_split=0.1) # Short training for demo

# 3. Define the Prediction Wrapper for LIME
# ---------------------------------------------------------
def to_grayscale_predict(images_rgb):
    """
    LIME passes 3-channel RGB images.
    We must convert them back to 1-channel grayscale for the model.
    """
    # 1. Convert RGB (N, 28, 28, 3) -> Grayscale (N, 28, 28, 1)
    # Since LIME just replicates the gray channel, we can just take the first channel.
    images_gray = images_rgb[:, :, :, :1] 
    
    # 2. Predict
    return model.predict(images_gray, verbose=0)

# 4. Setup LIME Explainer
# ---------------------------------------------------------
explainer = lime_image.LimeImageExplainer()

# Select a test image to explain
idx = 89  # Change this index to see different examples
img_to_explain_gray = x_test[idx,:].reshape(28,28,1) # Shape (28, 28, 1)

# IMPORTANT: LIME expects the input image to have 3 channels (RGB).
# We duplicate the grayscale channel 3 times to satisfy LIME.
img_to_explain_rgb = np.repeat(img_to_explain_gray, 3, axis=2) 

# Generate Explanation
# top_labels=5: analyze the top 5 predicted classes
# hide_color=0: replace perturbed superpixels with black (0)
# num_samples=1000: number of perturbations to generate

# 1. Pixel-wise segmentation function (Crucial for 28x28 images)

# 1. Pixel-wise segmentation function (Crucial for 28x28 images)
def pixel_segmentation(image):
    row_idx, col_idx = np.indices((28, 28))
    return row_idx * 28 + col_idx

# 2. Generate the Explanation
# We ask LIME to find the top 5 labels
explanation = explainer.explain_instance(
    img_to_explain_rgb, 
    to_grayscale_predict,
    top_labels=5, 
    hide_color=0,           # Mask with black (0) for MNIST
    num_samples=1000,
    segmentation_fn=pixel_segmentation
)

# 3. Retrieve the Label from the Explanation Object
# ---------------------------------------------------------
# explanation.top_labels contains the class indices sorted by model probability.
# Index [0] is the class with the highest probability (the winner).
# Index [1] would be the runner-up, etc.
target_label = explanation.top_labels[0] 

print(f"LIME identified class {target_label} ({class_names[target_label]}) as the top prediction.")

# 4. Get image and mask using that internal label
temp, mask = explanation.get_image_and_mask(
    target_label,           # Use the label extracted from explanation
    positive_only=False, 
    num_features=5, 
    hide_rest=False
)

# 5. Plot
plt.figure(figsize=(8, 4))

# Original
plt.subplot(1, 2, 1)
plt.imshow(img_to_explain_gray.squeeze(), cmap='BuGn')
plt.title(f"True Label: {class_names[y_test[idx]]}")
plt.axis('off')

# Explanation
plt.subplot(1, 2, 2)
plt.imshow(mark_boundaries(temp, mask), cmap='BuGn')
plt.title(f"Explanation for '{class_names[target_label]}'")
plt.axis('off')

plt.tight_layout()
plt.show()

LIME explains an image of sneaker from Fashion MNIST dataset
predicted by a 1-layer CNN with 32 neurons.

1. Local Model-agnostic methods

1.2. SHAP: SHapley Additive exPlanations

  • LIME: reflects a rough estimate of the local behavior of the ‘black box’ model in making prediction on specific instances.
  • It can be inconsistent as some important features may be underrated due to the flexibility of the black box model at different locations.
  • SHAP: carefully calculate the contribution of each feature using Game Theory.
  • It’s proven to be the only method that satisfies certain properties of fairness in rating influence of features.

1. Local Model-agnostic methods

1.2. SHAP: SHapley Additive exPlanations

  • Given a complex model \(f\) and data \(\color{blue}{x}\), the importance of feature \(\color{red}{j}\) denoted by \((\color{green}{\phi}_{\color{red}{j}})\): \[\begin{align*}\color{green}{\phi}_{\color{red}{j}}&=\frac{1}{|F|}\sum_{S\subseteq F\setminus\{\color{red}{j}\}}\frac{f(S\cup \{\color{red}{j}\})-f(S)}{C(|F|-1,|S|)}\\ &=\sum_{S\subseteq F\setminus\{\color{red}{j}\}}\underbrace{\frac{|S|!(|F|-|S|-1)!}{|F|!}}_{\text{Combination weight}}[\underbrace{f(S\cup \{\color{red}{j}\})-f(S)}_{\text{Marginal contribution}}].\end{align*}\]

Wikipedia: Lloyd Shapley in 2012.

🔑 Interpretation: Feature \(\color{red}{j}\)’s marginal contribution to a set \(S\) is the prediction change \(f(S\cup \{\color{red}{j}\})-f(S)\). The SHAP value averages this contribution across all possible combinations of features.

1. Local Model-agnostic methods

1.2. SHAP: SHapley Additive exPlanations

  • SHAP satisfies the following Local Accuracy Property: \[\underbrace{f(\color{blue}{x})}_{\text{Prediction of }\color{blue}{x}}=\underbrace{\color{green}{\phi}_{0}}_{\text{Baseline prediction}}+\sum_{j=1}^{|F|}\underbrace{\color{green}{\phi}_{\color{red}{j}}}_{\text{SHAP value of feature }\color{red}{j}}.\]
  • This is known as Efficiency in Game Theory and adapted to Local Accuracy by Lundberg & Lee, 2017 in the framework of SHAP value.
  • Overall importance: The average SHAP values over all the training points for any feature \(\color{red}{j}\): \[\color{purple}{\text{SHAP}}(\color{red}{j})=\frac{1}{n}\sum_{i=1}^n|\color{green}{\phi}_{\color{red}{j}}(\color{blue}{x_i})|.\]

Wikipedia: Lloyd Shapley in 2012.

1. Local Model-agnostic methods

1.2. LIME vs. SHAP

Key Differences Between LIME and SHAP
Feature LIME SHAP
Philosophy Local Linear Approximation Cooperative Game Theory
Stability Variable (run it twice, get slightly different results) Stable (theoretical guarantee)
Global View Hard to aggregate (weights aren’t comparable) Easy: Average SHAP values = Global Feature Importance
Speed Very Fast Slow / Computationally Intensive
Best Used For Quick debugging, real-time explanation, image boundaries. High-stakes decisions (Finance/Health), comparing drivers across dataset.

1. Local Model-agnostic methods

1.2. SHAP: Beeswarm plot

2. Global Model-agnostic methods

Permutation Feature Importance

  • LIME and SHAP mainly focus on local interpretability of the (black box) model \(f\).
  • Permutation Feature Importance (PFI): focus on global interpretability of the model.
  • It aims at quantifying the impact of each feature \(j\) on model performance.

Algorithm

  • Train: build \(f\) on the whole data.
  • Baseline: Compute model error \(\color{red}{\text{e}_0}\) (e.g. RMSE).
  • Permutation: For each feature \(\color{red}{j}\):
    • Randomly shuffle \(X_j\to\color{red}{\tilde{X_j}}\)
    • Train \(f_j\) on shuffled data \(D_j=[X_1,\dots,\color{red}{\tilde{X_j}},\dots,X_d]\).
    • Compute model error \(\color{red}{\text{e}_j}\)
  • Feature Importance: \(\color{green}{\text{FI}}(\color{red}{j})=\frac{\color{red}{\text{e}_j}-\color{red}{\text{e}_0}}{e_0}.\)

2. Global Model-agnostic methods

Permutation Feature Importance [Christoph Molnar]

Summary: LIME, SHAP, and PFI

1. LIME:

(Local Focus)

“Why this prediction?”

  • Pros:
    • Intuitive: Explains via simple local surrogates.
    • Fast: Good for checking single instances quickly.
  • Cons:
    • Unstable: Sampling noise can yield different explanations for same point.
    • Linearity: May fail on complex local manifolds.

2. SHAP:

(Theoretic Bridge)

“Fair contribution allocation”

  • Pros:
    • Consistent: Solid Game Theory foundation (Shapley Values).
    • Flexible: Aggregates from local to global views.
  • Cons:
    • Slow: Computationally expensive (often requires approximation).
    • Complex: Harder to explain the math to non-technical stakeholders.

3. {PFI}

(Global Focus)

“What matters overall?”

  • Pros:
    • Generalization: Measures impact on test set error.
    • Unbiased: Avoids cardinality bias of standard tree importance.
  • Cons:
    • Global Only: Does not explain specific decisions.
    • Cost: Retraining/Scoring can be slow with many features.

⚠️ Critical Watch-outs & Caveats

The Danger Zone

  1. The Correlation Trap: (Especially PFI) If Feature A and B are correlated, shuffling A won’t drop performance because B compensates. This leads to underestimating both.
  2. Background Data: (SHAP/LIME) Your choice of the “reference dataset” drastically changes the explanation.
  3. Correlation \(\neq\) Causality: These methods explain the model’s logic, not necessarily the real world’s physics.

🥳 Yeahhhh….









Let’s Party… 🥂